Learning to Read L’Infinito: Handwritten Text Recognition with Synthetic Training Data

نویسندگان

چکیده

Deep learning-based approaches to Handwritten Text Recognition (HTR) have shown remarkable results on publicly available large datasets, both modern and historical. However, it is often the case that historical manuscripts are preserved in small collections, most of time with unique characteristics terms paper support, author handwriting style, language. State-of-the-art HTR struggle obtain good performance such manuscript for which few training samples available. In this paper, we focus datasets propose a new dataset, call Leopardi, typical consisting letters by poet Giacomo devise strategies deal data scarcity scenario. particular, explore use carefully designed but cost-effective synthetic pre-training models be applied single-author manuscripts. Extensive experiments validate suitability proposed approach, Leopardi dataset will favor further research direction.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-training for Handwritten Text Line Recognition

Off-line handwriting recognition deals with the task of automatically recognizing handwritten text from images, for example from scanned sheets of paper. Due to the tremendous variations of writing styles encountered between different individuals, this is a very challenging task. Traditionally, a recognition system is trained by using a large corpus of handwritten text that has to be transcribe...

متن کامل

Active Learning for Historic Handwritten Text Recognition

This thesis examines the use of active learning for the task of handwritten text recognition in historical documents. Active learning is a machine learning paradigm which enables the learner to select the data that is being trained on. In domains where procuring annotated data is expensive but there are large amounts of unlabelled data available, active learning can lead to better models with t...

متن کامل

Using a Synthetic Character Database for Training Deep Learning Models Applied to Offline Handwritten Recognition

We present our current work on building a deep learning architecture for the offline handwritten character recognition problem. The proposed system is based on training a deep Convolutional Neural Network (CNN) to recognize handwritten characters, using a new synthetic character database derived from UNIPEN dataset. The presented approach is inspired in some successfully-used neural architectur...

متن کامل

Generating Synthetic Data for Text Recognition

Generating synthetic images is an art which emulates the natural process of image generation in a closest possible manner. In this work, we exploit such a framework for data generation in handwritten domain. We render synthetic data using open source fonts and incorporate data augmentation schemes. As part of this work, we release 9M synthetic handwritten word image corpus which could be useful...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-89131-2_31